Appendix B — Assignment 3 (Data structures)

Instructions

  1. You may talk to a friend, discuss the questions and potential directions for solving them. However, you need to write your own solutions and code separately, and not as a group activity.

  2. Do not write your name on the assignment.

  3. Write your code in the Code cells and your answer in the Markdown cells of the Jupyter notebook. Ensure that the solution is written neatly enough to understand and grade.

  4. Use Quarto to print the .ipynb file as HTML. You will need to open the command prompt, navigate to the directory containing the file, and use the command: quarto render filename.ipynb --to html. Submit the HTML file.

  5. There are 5 points for clealiness and organization. The code should be commented and clearly written with intuitive variable names. For example, use variable names such as number_input, factor, hours, instead of a,b,xyz, etc.

  6. The assignment is worth 100 points, and is due on 21st October 2022 at 11:59 pm.

C 1 GDP of The USA

USA’s GDP per capita from 1960 to 2021 is given by the tuple T in the code cell below. The values are arranged in ascending order of the year, i.e., the first value is for 1960, the second value is for 1961, and so on.

Code
T = (3007, 3067, 3244, 3375,3574, 3828, 4146, 4336, 4696, 5032,5234,5609,6094,6726,7226,7801,8592,9453,10565,11674,12575,13976,14434,15544,17121,18237,19071,20039,21417,22857,23889,24342,25419,26387,27695,28691,29968,31459,32854,34515,36330,37134,37998,39490,41725,44123,46302,48050,48570,47195,48651,50066,51784,53291,55124,56763,57867,59915,62805,65095,63028,69288)

C.1 1(a)

C.1.1 1(a)(i)

Use list comprehension to produce a list of the gaps between consecutive entries in T, i.e, the increase in GDP per capita with respect to the previous year. The list with gaps should look like: [60, 177, …].

(6 points)

C.1.2 1(a)(ii)

Use the list developed in 1(a)(i) to find the maximum gap size, i.e, the maximum increase in GDP per capita.

(2 points)

C.1.3 1(a)(iii)

Using list comprehension with the list developed in 1(a)(i), find the percentage of gaps that have size greater than $1000.

(6 points)

C.2 1(b)

C.2.1 1(b)(i)

Create a dictionary D, where the key is the year, and value for the key is the increase in GDP per capita in that year with respect to the previous year, i.e., the gaps computed in 1(a)(i).

(6 points)

C.2.2 1(b)(ii)

Use the dictionary D to find the year when the GDP per capita increase was the maximum as compared to the previous year. Use the list comprehension method.

(6 points)

Hint: […… for …. in D.items() if ……]

C.2.3 1(b)(iii)

Use the dictionary D to find the years when the GDP per capita decreased with respect to the previous year. Use the list comprehension method.

(6 points)

D 2 Ted Talks

D.1 2(a)

Read the file TED_Talks.json on ted talks using the code below. You will get the data in the object TED_Talks_data. Just look at the data structure of TED_Talks_data. You will need to know how the data is structured in lists/dictionaries to answer the questions below.

(2 points)

Code
import json
with open("TED_Talks.json", "r") as file:
    TED_Talks_data=json.load(file)

D.2 2(b)

Find the number of talks in the dataset.

(2 points)

D.3 2(c)

Find the headline, speaker and year_filmed of the talk with the highest number of views.

(6 points)

D.4 2(d)

What are the mean and median number of views for a talk? Can we say that the majority of talks (i.e., more than 50% of the talks) have less views than the average number of views for a talk? Justify your answer.

(6 points)

D.5 2(e)

Do at least 25% of the talks have more views than the average number of views for a talk? Justify your answer.

(4 points)

D.6 2(f)

Find the headline of the talk that received the highest number of votes in the Confusing category.

(8 points)

D.7 2(g)

Find the headline and the year_filmed of the talk that received the highest percentage of votes in the Fascinating category.

\[\text{Percentage of } \textit{Fascinating} \text{ votes for a ted talk} = \frac{Number \ of \ votes \ in \ the \ Fascinating \ category \ }{Total \ votes \ in \ all \ categories}\]

(10 points)

E 3 Poker

The object deck defined below corresponds to a deck of cards. Estimate the probability that a five card hand will be:

  1. Straight

  2. Three-of-a-kind

  3. Two-pair

  4. One-pair

  5. High card

You may check the meaning of the above terms here.

(25 points)

Hint:

Estimate these probabilities as follows.

  1. Write a function that accepts a hand of 5 cards as argument, and returns relevant characterisitics of a hand, such as the number of distinct card values, maximum occurences of a value etc. Using the values returned by this function (may be in a dictionary), you can compute if the hand is of any of the above types (Straight / Three-of-a-kind / two-pair / one-pair / high card).

  2. Randomly pull a hand of 5 cards from the deck. Call the function developed in (1) to get the relevant characteristics of the hand. Use those characteristics to determine if the hand is one of the five mentioned types (Straight / Three-of-a-kind / two-pair / one-pair / high card).

  3. Repeat (2) 10,000 times.

  4. Estimate the probability of the hand being of the above five mentioned types (Straight / Three-of-a-kind / two-pair / one-pair / high card) from the results of the 10,000 simulations.

You may use the function shuffle() from the library random to shuffle the deck everytime before pulling a hand of 5 cards.

You don’t need to stick to the hint if you feel you have a better way to do it. In case you have a better way, you can claim 10 bonus points for this assignment.

Code
deck = [{'value':i, 'suit':c}
for c in ['spades', 'clubs', 'hearts', 'diamonds']
for i in range(2,15)]